Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks from HN blog post #260

Merged
merged 2 commits into from
Jan 8, 2024
Merged

Add benchmarks from HN blog post #260

merged 2 commits into from
Jan 8, 2024

Conversation

maximecb
Copy link
Contributor

@maximecb maximecb commented Jan 5, 2024

The other day I saw this HN blog post about benchmarking 20 programming languages, in which they had benchmarked Python and Ruby (with YJIT!) among many others. The good news is that Ruby with YJIT greatly outperformed Python. The less good news is that we're still far behind other languages including PHP.

I asked the author for permission to include the Ruby benchmarks that he used. These are synthetic with big loops. I think they could be useful to benchmark changes we make to the register allocator. It would also be interesting to see if we can make performance on the matrix multiplication benchmark go up significantly by implementing floating point addition and multiplication.

-------  -----------  ----------  ---------  ----------  ------------  -----------
bench    interp (ms)  stddev (%)  yjit (ms)  stddev (%)  yjit 1st itr  interp/yjit
matmul   3408.0       0.6         1537.8     0.1         2.20          2.22       
nqueens  879.6        0.8         880.1      0.1         1.00          1.00       
sudoku   4054.0       0.1         1574.5     0.1         2.59          2.57       
-------  -----------  ----------  ---------  ----------  ------------  -----------

It's also interesting that we see zero speed boost on nqueens even though --yjit-stats reports zero exits. Suggests there is an exit that we're not catching/counting somewhere @k0kubun @XrXr

total_insns_count:       182,080,967
vm_insns_count:          182,071,959
yjit_insns_count:              9,008
ratio_in_yjit:                  0.0%
avg_len_in_yjit:                 9.5
total_exits:                    0

@maximecb maximecb self-assigned this Jan 5, 2024
@maximecb maximecb requested review from nirvdrum, k0kubun and XrXr January 5, 2024 21:23
@maximecb
Copy link
Contributor Author

maximecb commented Jan 5, 2024

@nirvdrum you can probably get a 10-20x speedup on these with TruffleRuby :)

@k0kubun
Copy link
Member

k0kubun commented Jan 5, 2024

@eregon It appears that TruffleRuby no longer works correctly with cool.io https://github.com/Shopify/yjit-bench/actions/runs/7426843741/job/20211367451. It doesn't seem like the problem of yjit-bench but TruffleRuby's. Could you fix this issue?

@k0kubun
Copy link
Member

k0kubun commented Jan 5, 2024

It's also interesting that we see zero speed boost on nqueens even though --yjit-stats reports zero exits. Suggests there is an exit that we're not catching/counting somewhere

nq_solve is called only num_itrs times. run_benchmarks.rb runs a benchmark only 25 itrs, so it's called only 25 times. Thus nq_solve is never JIT-ed with the default --yjit-call-threshold=30. I think you want to change how the method is executed in the script.

With --yjit-call-threshold=1, total_exits is not 0. But the method seems to exit on opt_ltlt and that seems to make ratio_in_yjit 0.0%, so YJIT doesn't do well on that benchmark even with a low call threshold.

@k0kubun
Copy link
Member

k0kubun commented Jan 5, 2024

exit on opt_ltlt

left shift (ltlt) exit reasons:
    amt_changed:          1 (100.0%)

Ah, so we expected a left shift to not have a dynamic amount, and it did in this method. The amount can be as large as 11, so if we chain guard more than 11 times and/or fallback to a slower implementation instead of existing there, we wouldn't exit.

edit: I tried chain-guarding 11 times, but falling back immediately after the first chain was faster.

@noahgibbs
Copy link
Contributor

noahgibbs commented Jan 8, 2024

Useful but not mandatory: deterministic solvers like this are great for doing a quick check after the run_benchmark() loop to make sure we're getting the right result (after is better than before -- we want to make sure we're checking the answer with JITted code). That way if something starts returning the wrong answer we'll know really quickly.

@eregon
Copy link
Contributor

eregon commented Jan 8, 2024

@eregon It appears that TruffleRuby no longer works correctly with cool.io https://github.com/Shopify/yjit-bench/actions/runs/7426843741/job/20211367451. It doesn't seem like the problem of yjit-bench but TruffleRuby's. Could you fix this issue?

Seeing this now. Thanks, I'll take a look. Let's track it in #262

Copy link
Contributor

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a quick look at the diff it's rather clear the Ruby code there is not idiomatic and formatted rather in an unusual way. Which is expected given https://github.com/attractivechaos/plb2's As I am mostly a C programmer, implementations in other languages may be suboptimal and there are no implementations in functional languages..

While it looks like mostly style differences, there are also performance implications for for vs each and other loop methods.

I think it would be worth improving these benchmarks, and also try to upstream that, so it looks like regular/typical Ruby code, and not C translated to Ruby, which at least looks unrepresentative of typical Ruby code.
My guess is after small tweaks it's actually reasonable code, but right now it's hard because even the indentation is unusual and displays very poorly (at least on GitHub).


def matgen(n)
tmp = 1.0 / n / n
a = Array.new(n) { Array.new(n) { 0 } }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of something suboptimal and somewhat not idiomatic, Array.new(n, 0) is much faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whereas I looked at that and thought, "oh, sometimes I see people use the default value like this to avoid having to do early initialization, I'm glad we're getting a benchmark for it".

But you're right, it's the exception and not the rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outer Array.new needs to use a block, so that form is getting benchmarked anyway :)

@maximecb maximecb merged commit da7585a into main Jan 8, 2024
3 of 4 checks passed
@maximecb maximecb deleted the new_benchmarks branch January 8, 2024 17:17
@nixme
Copy link

nixme commented Jan 11, 2024

I took a stab at cleaning up them up, which got accepted: https://github.com/attractivechaos/plb2/tree/bbff5fba14b7ad55eac1ec1faa6f3db8798bcc70/src/ruby

Timings improved for matmul and sudoko:

  • matmul: before: 130.51s, after: 64.95s
  • sudoku: before: 52.26s, after: 17.47s

@eregon
Copy link
Contributor

eregon commented Jan 11, 2024

Nice!

@maximecb
Copy link
Contributor Author

Feel free to open a PR to update the benchmarks here as well.

@eregon
Copy link
Contributor

eregon commented Jan 12, 2024

Feel free to open a PR to update the benchmarks here as well.

I did it in #268

@maximecb
Copy link
Contributor Author

Thanks Benoit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants